Abstractive Text Summarization (1)

  • Ali Ghazal
  • Sarah ElNaggar

Problem Statement

Abstractive Text Summarization is the task of generating a short and concise summary that captures the salient ideas of the source text. The generated summaries potentially contain new phrases and sentences that may not appear in the source text.

Dataset

The Multilingual Amazon Reviews Corpus

Contains reviews of amazon products. Each review contains a header (summary) and a review body ( document). We used them to fine-tune our model for summarization.

training 20,000
Validation 5,000
Test 5,000

XSUM

Contains articles from multiple news sources coupled with human produced extreme (very short) summary

training 204,045
Validation 11,332
Test 11,332


Input/Output Examples

Input

Jordan Hill, Brittany Covington and Tesfaye Cooper, all 18, and Tanishia Covington, 24, appeared in a Chicago court on Friday.\nThe four have been charged with hate crimes and aggravated kidnapping and battery, among other things.\nAn online fundraiser for their victim has collected $51,000 (£42,500) so far.\nDenying the four suspects bail, Judge Maria Kuriakos Ciesil asked: "Where was your sense of decency?"\nProsecutors told the court the beating started in a van and continued at a house, where the suspects allegedly forced the 18-year-old white victim, who suffers from schizophrenia and attention deficit disorder, to drink toilet water and kiss the floor.\nPolice allege the van was earlier stolen by Mr Hill, who is also accused of demanding $300 from the victim\'s mother while they held him captive, according to the Chicago Tribune.\nThe court was also told the suspects stuffed a sock into his mouth, taped his mouth shut and bound his hands with a belt.\nIn a video made for Facebook Live which was watched millions of times, the assailants can be heard making derogatory statements against white people and Donald Trump.\nThe victim had been dropped off at a McDonalds to meet Mr Hill - who was one of his friends - on 31 December.\nHe was found by a police officer on Tuesday, 3 January, a day after he was reported missing by his parents.\nProsecutors say the suspects each face two hate crimes counts, one because of the victim\'s race and the other because of his disabilities.

Output

Four people accused of kidnapping and torturing a mentally disabled man in a "racially motivated" attack streamed on Facebook have been denied bail.

State of the art

State here state of the art models and their accuracies



Orignial Model from Literature

Various models have been proposed for the text summarization problem such as BERT, BART, PEGASUS, and T5. All those models follow the encoder decoder coupled with an attention mechanism. Those models achieved the state-of-the-art performance on the summarization task through train super powerful language models. For example, BART was trained on a variety of tasks such as text infilling, sentence deletion, and sentence permutation. Afterwards, those models were trained for the downstream tasks by tuning their weights by training them on the dataset of interest. There are many considerations while selecting a model. The main criteria that we have chosen was the smallness of the model and the ease training them. We wanted to be able to gain quick feedback. Taking this into account, we decided to work the T5-small: the t5 version with the smallest number of Encoder-decoder layers. The T5 model was introduced in the paper Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. The same exact model was pretrained to perform many downstream tasks such as translation, question answering, and summarization.



The Main Architecture of T5-small is as follows. It consists of Standard Encoder/decoder-based architecture. RelU Activation Function was mainly used. However, we experimented also with Gated-Glu Function which was previously proposed for the BART model. Lastly, it has a Self Attention Mechanism for both the encoder and the decoder. We also experimented with using cross-attention instead.


Proposed Updates

Add all the model updates you made here, need as many images as you wish

Update #1: Increase the beam search to 8 instead of 4
Update #2: Increase the number of layers of T5-Small Model to 8 instead of 5
Update #3: Change the activation function from RLUE to GLUE
Update #4: Adding CrossAttention Mechanism

Results

Stage Training

Rouge 1



Rouge 2




Activation

Rouge 1



Rouge 2




Cross Attention

Rouge 1



Rouge 2




Depth

Rouge 1



Rouge 2




Number of BEAMs

Rouge 1



Rouge 2




Training/Validation Loss





Technical report

Here you will detail the details related to training, for example:

  • Programming framework: Python, tensorflow, huggingface Transformer
  • Training hardware: colab, alienware aurora r11 with 2 Nividia GTX 2060
  • Training time: 8 hours
  • Number of epochs: 10
  • Time per epoch: 52 mins

Conclusion

Throughout the project, we have built an abstractive summarization deep learning model. We experimented with different models. However, we decided to go with the T5-model. We experimented with different numbers of hyper-parameters such as the number of beams, number of layers, activation function, cross attention mechanism, and lastly stages training. As an output of this project, we are proposing utilizing the gated-glu activation function, cross attention mechanism, and increasing the number of beams utilized in the beam search.

References

List all references here, the following are only examples

  1. Abstractive text summarization. Papers With Code. (n.d.). Retrieved February 17, 2022, from https://paperswithcode.com/task/abstractive-text-summarization
  2. Attention is all you need - neurips. (n.d.). Retrieved February 17, 2022, from https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
  3. Liu, Y., & Lapata, M. (2019). Text summarization with pretrained encoders. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/d19-1387
  4. Nallapati, R., Zhou, B., dos Santos, C., Gulcehre, C., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence RNNS and beyond. Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning. https://doi.org/10.18653/v1/k16-1028
  5. See, A., Liu, P. J., & Manning, C. D. (2017). Get to the point: Summarization with Pointer-Generator Networks. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). https://doi.org/10.18653/v1/p17-1099